Efficient maintenance of job prioritization for profit maximization in cloud service delivery infrastructures

ABSTRACT

Systems and methods are disclosed for efficient maintenance of job prioritization for profit maximization in cloud-based service delivery infrastructures with multi-step cost structure support by breaking multiple steps in the SLA of a job into corresponding cost steps; generating a segmented cost function for each cost step; creating a cost-based-scheduling (CBS)-priority value associated with a validity period for each segment based on the segmented cost function; and choosing the job with the highest CBS priority value.

This application claims priority to U.S. Provisional Application Ser.Nos. 61/294,246 and 61/294,254, both filed on Jan. 12, 2010, thecontents of which are incorporated by reference.

BACKGROUND

This application relates to Constraint-Conscious Optimal Scheduling forCloud Infrastructures.

Cloud computing has emerged as a promising computing platform with itson-demand scaling capabilities. Typically, a cloud service deliveryinfrastructure is used to deliver services to a diverse set of clientssharing the computing resources. By providing on-demand scalingcapabilities without any large upfront investment or long-termcommitment, it is attracting a wide range of users, from webapplications to Business Intelligence applications. The databasecommunity has also shown great interest in exploiting this new platformfor scalable and cost-efficient data management. Arguably, the successof cloud-based services depends on two main factors: quality of servicethat are identified through Service Level Agreements (SLAs) andoperating cost management.

Users of cloud computing services are not only able to significantlyreduce their IT costs and turn their capital expenditures to operationalexpenditures, but also able to speed up their innovation capabilitiesthanks to the on-demand access to vast IT resources in the cloud. Whilethe cloud computing offers the clients all these advantages, it createsa number of challenges for the cloud service providers who try to createsuccessful businesses: they have to handle diverse and dynamic workloadsin a highly price-competitive way, to convince the potential clients touse the service delivery model instead of in-house hosting of ITfunctions. In addition, the quality of service should be comparable inall aspects to the capabilities that can be delivered off of an ITinfrastructure under full control of clients. Thus, the success ofcloud-based services arguably depends on the two major factors: qualityof service, which is captured as Service Level Agreements (SLAs) andoperational cost management.

The consistent delivery of services within SLAs is crucial for sustainedrevenue for the service provider. Delivering those services incursoperational costs and the difference between the revenue and theoperational costs is the service provider's profit, which is requiredfor any commercially viable businesses.

The total profit, P, of the cloud service provider is defined asP=Σ_(i)r_(i)−C, where r_(i) is the revenue that can be generated bydelivering the service for a particular job i and C is the operationalcost of running the service delivery infrastructure. The revenue, R, isdefined for each job class in the system. Each client may have multiplejob classes based on the contract. A stepwise function is used tocharacterize the revenue as shown in FIG. 1. Intuitively, the clientsagree to pay varying fee levels for corresponding service levelsdelivered for a particular class of requests, i.e., job classes in theircontracts. For example, the client may be willing to pay a higher ratefor lower response times. As shown in FIG. 1, the client pays R₀ as longas the response time is between 0 and X₁, and pays R₁ for the intervalof X₁ and X₂, and so on. This characterization allows more intuitiveinterpretation of SLAs with respect to revenue generation. Once therevenue function is defined, the revenue function defines a costfunction, called SLA cost function. If the level of services changes,the amount that the provider can charge the client also changesaccording to the contract. Due to the limitations on the availability ofinfrastructure resources, the cloud service provider may not be able orchoose to attend to all client requests at the highest possible servicelevels. Dropping/Increasing service levels cause loss/increase in therevenue. The loss of potential revenue corresponds to SLA cost. Forexample, there is no revenue loss, hence no SLA penalty cost, as long asresponse time is between 0 and X₁ in FIG. 1. Likewise, increasing theamount of infrastructure resources to increase service levels results inincreased operational cost. As a result, the key problem for theprovider is to come up with optimal service levels that will maximizeits profits based on the agreed upon SLAs.

SLAs in general may be defined in terms of various criteria, such asservice latency, throughput, consistency, security, etc. One embodimentfocuses on service latency, or response time. Even with latency alone,there can be multiple specification methods:

-   -   Mean-value-based SLA (MV-SLA): For each job class, quality of        service is measured based on mean response time. This is the        least robust type of SLAs from the customers' perspective.    -   Tail-distribution-based SLA (TD-SLA): For each job class,        quality of service is measured in terms of the portion of jobs        finished by a given deadline. For instance, a user may want 99%        of job to be finished within 100 ms.    -   Individual-job-based SLA (IJ-SLA): Quality of service is        measured using the response time of individual jobs. Unlike        MV-SLA or TD-SLA above, in IJ-SLA any single job with a poor        service quality immediately affects the measured quality of        service and incurs some SLA penalty cost.

For each specification method, the SLA can be classified either as ahard SLA or a soft SLA as follows.

-   -   Hard SLA: A hard SLA has a single hard deadline to meet, and if        the deadline missed, it is counted as a violation. The        definition of this type of SLA, or constraint, may come from the        client or the cloud service provider. There are cases where a        cloud provider needs to use Hard SLAs as a tool to control        various business objectives, e.g., controlling the worst case        user experience. Therefore the violation of a hard SLA may not        correspond to financial terms in the client contracts.    -   Soft SLA: A soft SLA corresponds to agreed levels of service in        the contract. This is different from the hard SLA in that even        after the violation, SLA penalty cost may continue to increase        as response time further increases. Although the SLA penalty        cost may have various shapes, stepwise function is a natural        choice used in the real-world contracts. SLAs in general may be        defined in terms of various criteria, such as service latency,        throughput, consistency, security, etc.

The unit of operational cost is a server cost per hour. Consequently,the total operational cost, C, is the sum of individual server costs fora given period of time. The individual server cost is the aggregation ofall specific costs items that are involved in operating a server, suchas energy, administration, software, among others. Conventionalscheduling systems typically rely on techniques that do not primarilyconsider profit maximization. These techniques mainly focus onoptimizing metrics such as average response time.

SUMMARY

In a first aspect, systems and methods are disclosed to schedule jobs ina cloud computing infrastructure by receiving in a first queue jobs withdeadlines or constraints specified in a hard service level agreement(SLA); receiving in a second queue jobs with a penalty cost metricspecified in a soft SLA; and minimizing both constraint violation countand total penalty cost in the cloud computing infrastructure byidentifying jobs with deadlines in the first queue and delaying jobs inthe first queue within a predetermined slack range in favor of jobs inthe second queue to improve the penalty cost metric.

In a second aspect, systems and methods are disclosed for efficientmaintenance of job prioritization for profit maximization in cloud-basedservice delivery infrastructures with multi-step cost structure supportby breaking multiple steps in the SLA of a job into corresponding coststeps; generating a segmented cost function for each cost step; creatinga cost-based-scheduling (CBS)-priority value associated with a validityperiod for each segment based on the segmented cost function; andchoosing the job with the highest CBS priority value.

Advantage of the preferred embodiments may include one or more of thefollowing. The system provides a very efficient job prioritization fordiverse pricing agreements across diverse clients and heterogeneousinfrastructure resources. In cloud computing infrastructures, the systemenables profit optimization, which is a vital economic indicator forsustainability. The system considers discrete levels of costscorresponding to varying levels of service, which is more realistic inmany real-life systems. The system is also efficient and low incomputational complexity to be feasible for high volume and largeinfrastructures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an exemplary system diagram of an Intelligent CloudDatabase Coordinator (ICDC).

FIG. 2 shows an exemplary cost function segmentation in iCBS.

FIG. 3 shows a Constraint-Conscious Optimization Scheduling (CCOS)system.

FIG. 4 shows an exemplary slack tree used with the CCOS system.

FIG. 5 shows an example where a new job is inserted into the slack tree.

FIG. 6 shows an exemplary process for prioritization and scheduling ofincoming jobs based on cost functions.

FIG. 7 shows an exemplary process to schedule jobs prioritized in FIG.6.

DESCRIPTION

FIG. 1 shows an exemplary system diagram of the ICDC. The ICDC managesvery large cloud service delivery infrastructures. The systemarchitecture focuses on components that are relevant and subject tooptimization to achieve the goal of SLA-based profit optimization ofresource and workload management in the cloud databases. The use ofdistinctively optimizing individual system components with a globalobjective in mind provides a greater degree of freedom to customizeoperations. This approach yielded higher degrees of performance,customizability based on variable business requirements, and end-to-endprofit optimization.

In one embodiment, clients 10 communicate with ICDC using a standardJDBC API and make plain JDBC method calls to talk to various databaseswithout changing codes. The clients 10 communicate with a query router20. An autoscaler 30 monitors the queue length log and query responsetime log and determines if additional nodes should be added by anadd/drop controller 40. The controller issues commands to add/drop nodesto a database replication cluster 50 such as a MySQL replicationcluster. Although the system of FIG. 1 shows specific product names,such as MySQL and Active MQ, for example, the system is not limited tothose products. For example MySQL can be replaced with other databaseproducts such as Oracle, among others.

The ICDC has a Client Data Module that is responsible for maintainingclient specific data such as cost functions and SLAs, which are derivedfrom client contracts. Once captured, this information is made availableto other system modules for resource and workload management purposes.An ICDC Manager monitors the status of system, e.g. system load, queuelengths, query response time, CPU and I/O utilization. All thisinformation is maintained by the System Data module. Based on systemmonitoring data the ICDC Manager directs the Cluster Manager to add orremove servers from/to Resource Pool to optimize the operational costwhile keeping the SLA costs in check. The ICDC Manager also provides thedispatcher and scheduler modules with the dynamic system data. An OnlineSimulator is responsible for dynamic capacity planning. It processes theclient data and dynamic system data to assess optimum capacity levelsthrough simulation. It has capabilities to run simulations both inoffline and online modes. A Dispatcher takes incoming client calls andimmediately forwards the queries (or jobs) to servers based on theoptimized dispatching policy. The dispatching policy is constantly tunedaccording to dynamic changes in the system, such as user traffic,addition/removal of processing nodes. A Scheduler decides the order ofexecution of jobs at each server. After the client requests aredispatched to individual servers based on the dispatching policy,individual scheduler modules are responsible for prioritization ofdispatched jobs locally by forming a queue of queries, from which aquery is chosen and executed in the database. The choice of which queryto execute first makes a difference in the SLA penalty costs observed.

The system uses an SLA-based profit optimization approach for buildingand managing a data management platform in the cloud. The problem ofresource and workload management is done for a data management platformthat is hosted on an Infrastructure-as-a-Service (IaaS) offering, e.g.,Amazon EC2. The data management platform can be thought of aPlatform-as-a-Service (PaaS) offering that is used bySoftware-as-a-Service (SaaS) applications in the cloud.

In the system model, each server node represents a replica of adatabase. When a query (job) arrives, a dispatcher immediately assignsthe query to a server among multiple servers, according to certaindispatching policy; for each server, a resource scheduling policydecides which query to execute first, among those waiting in theassociated queue; and a capacity planning component is in charge ofdetermining how many resources (i.e., database servers) to be allocatedin the system. With this abstraction, the system optimizes three tasks:query dispatching, resource scheduling, and capacity planning.

Next, the scheduling component of ICDC system is discussed. TheScheduler has two distinct features: cost sensitive and constraintconscious. In one embodiment using a conventional heuristic cost-basedscheduling called CBS, the system evaluates the priorities of the n jobsin the queue individually, in a constant time, and to pick the job withthe highest priority. To efficiently evaluate the priority of job i, CBSconsiders two possible cases: i) the job is served immediately atcurrent time t, which will incur a cost of c_(i)(t), where c_(i)(t) isthe cost function of job i with the queue wait time t, and ii) the jobgets delayed by a wait time, τ, and then served, which will cause thecost of c_(i)(t+τ). Since the value of τ is not known, CBS usesprobability density function, and compute the expected cost based onthat. Thus, the CBS priority for a job i is,

p _(i)(t)=∫₀ ^(∞) a(τ)·c _(i)(t+τ)dτ−c _(i)(t)  (1)

where a(τ) is a probability distribution to model the waiting time of ajob in a queue if it is not served immediately. After computing p_(i)(t) value, they divide it by the job's service time, since longer joboccupies the server for a longer time, delaying other jobs for a longertime period. The exponential function, a(τ)=1/β·e^(−dβ), works well, andβ=1.4.

Because CBS examines all the jobs in the queue in order to pick the nextjob to serve, in systems where queues can grow very long and job servicetime can be very short, CBS can be slow. Another embodiment uses anefficient version of CBS called iCBS (incremental CBS). iCBS uses apriority queue to maintain the list of jobs according to their priorityand to dynamically update the priority queue when a new job arrives oran existing one removed from the queue. Because the priority queue ismaintained incrementally, iCBS has a logarithmic time complexity.

The iCBS system breaks multiple steps of a cost function into multiplecost functions, as shown in FIG. 2. Each of the segmented cost functionshas its own validity period, (x₁,x₂], and it is used to compute thepriority of the corresponding job between time x₁<x≦x₂. Segmentation isdone by removing and pushing down steps as follows. The first segment issame as the original cost function, and its validity period is theduration of the first step, i.e., (0, x₁]. The second segment isobtained by removing the first step and pushing down the rest of stepsby the second step's cost (or y-value). Its validity period is theduration of the second step, i.e., (x₁, x₂]. This is repeated until thelast step is reached, where the cost is defined as zero for its validityperiod, which is the duration of the last step, i.e., in the examplewhere (x₂, ∞].

As a(τ) in Equation 1 follows an exponential distribution, the relativepriority order between the valid segments of two jobs remains unchangedover time, as long as the segments are still valid. The iCBS processdecreases newly arrived jobs' priorities, instead of increasing theexisting jobs' priorities, to avoid modification of existing jobs'priorities in the queue, while keeping the relative order the same.

In the iCBS process, for new job arrivals, one segmented cost functionis generated for each cost step, and segmented-CBS-priority associatedwith the validity period of each segment is generated. Then, eachpriority value is divided by e^((t−t) ⁰ ^()/a), where is t is currenttime, and t₀ is a fixed time instance, such as system time zero. Thesegmented-CBS-priority objects are inserted into a priority queue, wherethe objects are ordered by CBS priority. Among all segmentscorresponding to the same job, segment i will always have higher CBSpriority than segment j, where i<j. The system also adds a nextSegmentpointer from segmented-CBS-priority object to i+1, to chain the segmentsof the same job.

For job scheduling, at the time of picking the next job to serve, thehead of segmented-CBS-priority queue is pulled, which has the highestpriority value to see if its validity period has expired or thecorresponding job has been scheduled by an earlier segment. In eithercase, the segment is thrown away, and the next head off the priorityqueue is pulled, until the system finds the segmented-CBS-priority withan unexpired validity period and also has not been scheduled yet. Whenfound, the system marks the other segments for the same job in thepriority queue as scheduled, using nextSegment pointer.

In Constraint-Conscious Optimization Scheduling, iCBS achievesnear-optimal cost, but it is prone to starvation as its sole objectiveis cost minimization. In the real world, however, this may not bedesirable. For instance, service providers may want to provide certainbottom line performance guarantee for all jobs, such as a guarantee thatall jobs can be finished within ten times of job service time. Also, itmay be desired to provide the worst-case performance guarantee forselected VIP customers. These types of hard SLAs need to be enforced, ontop of soft SLAs that affect SLA costs.

To meet such needs, a scheduling embodiment of FIG. 3 manages hard SLAs,i.e. deadlines or constraints, and soft SLAs, i.e. optimization metric.This embodiment optimizes the metric while achieving (near-) minimalpossible constraint violation. As violation of constraints or deadlinesmay be unavoidable in general (jobs may arrive in a bursty fashion) thesystem tries to make the minimum possible number of violations. Only asubset of jobs may have deadlines, which may happen as in the VIPexample above. The optimization metric can be the average response timeor the above discussed average cost.

FIG. 3 shows a Constraint-Conscious Optimization Scheduling (CCOS)system. CCOS employs dual queue approach: 1) an opti-queue 110 is anoptimization queue where all jobs are queued. SJF is used withoutmodification, if response time minimization is the optimization goal,and iCBS is used, if cost minimization is the goal; and 2) aconstraint-queue 120 employs EDF (Earliest Deadline First) process. Onlythe jobs with deadlines are queued here. FIG. 3's CCOS balances betweenthe following two extremes: 1) ignore deadlines (always schedule jobsfrom opti-queue, achieving the best cost-based results, withuncontrolled deadline violation); and 2) blindly pursue violationcontrol (schedule jobs from constraint-queue whenever it has a job, andattend opti-queue only when constraint-queue is empty). A job is deletedfrom both queues when it is scheduled from either one. The balance isachieved by observing that deadlines are not always urgent. There may besome job with deadlines in constraint-queue, but it may wait some time,called slack, without violating the deadline. Once known, the system candelay it, and attend opti-queue, to improve optimization metric.

The scheduling system of FIG. 3 manages both hard SLAs, i.e. deadlinesor constraints, and a cost optimization metric, that is also call softSLAs. The operating costs metrics are optimized while possibleconstraint violations are minimized. This is done by the dual-queuebased component where one queue handles the hard SLAs and the otherqueue handles the soft SLAs, and a system-monitoring component thatefficiently monitors those queues.

The main challenge of CCOS is to efficiently monitor the slack of jobsin the constraint-queue, which is defined as follows based on EDFscheduling policy. Given n jobs, J_(i), 1≦i≦n in constraint-queue, wherethe job length of J_(i) is l_(i), the deadline of J_(i) is d_(i), andd_(i)≦d_(j) if i<j, the slack of J_(i) at time t is,

$s_{i} = {d_{i} - t - {\sum\limits_{k = 1}^{i}l_{k}}}$

s_(i) can be determined for 1≦i≦n by iterating jobs in thenon-decreasing order of deadlines, and testing the minimum slack, i.e.min_(i) s_(i), against a slack threshold s_(th). If the minimum slack isless than or equal to s_(th), jobs need to be removed from theconstraint-queue 120. The only parameter of CCOS is s_(th), and within alarge range, i.e. [3*mean-job-length, 10*mean-job-length], theperformance is not very sensitive to the parameter value. A datastructure named slack tree can be used that supports fast minimum slackmonitoring.

An example of slack tree is shown in FIG. 4. Each leaf node has a job inthe order of non-decreasing deadline from left to right. FIG. 4 shows abinary tree for illustration, but slack tree can have arbitrary fan-outsat each node. Each node maintains two values: left sibling executiontime total (LSETT) and minimum slack in subtree (MSS). LSETT of a nodeis the total execution time, or total job length, of left siblings,which are the nodes to the left, sharing the same parent node. MSS of anode is the minimum slack in the subtree rooted at the node.

MSS of a leaf node node_(i) can be determined as MSS_(i)=d_(i)−l_(i),where d_(i) is the deadline of the node i′s job and l_(t) is the nodei′s job length. MSS of a non-leaf node node_(i) is recursively computedas:

MSS_(i)=min_(node) _(j) _(εchildrenofnode) _(i) MSS_(j)−LSETT_(j)

Root node's MSS represents the minimum slack of the whole tree.

Since the slack tree has all jobs in constraint-queue as its leaf nodes,each insertion and deletion from the queue translates to an insertionand a deletion to the tree. Slack tree efficiently supports thesefrequent changes.

FIG. 5 shows an example where a new job J₉ is inserted into the slacktree. Underlined numbers indicate the updated information from FIG. 5.Based on its deadline, J₉ is inserted between J₃ and J₄. This triggersupdates of LSETT and MSS of other nodes as follows. At the parent nodeof J₉, it updates MSS value from 30 to 29, as the slack of J₄ is reducedby 1. Its updated MSS affects its parent node's MSS as well, updating itfrom 15 to 14. Now its right sibling node is affected as well, such thatLSETT has been increased by 1, from 30 to 31. These two nodes reporttheir updated contribution to the root node's MSS, 14 and 9,respectively, and the root node updates MSS from 10 to 9. Given the nodefan-out of k, insertion takes k time at each level, and therefore ittakes O(k·log_(k)n), or simply O(log n). Deletion is done in a similarfashion, giving the same time complexity.

The prior discussion address the problem of “which job to serve first”at a single server. With multiple such servers, a central dispatcherneeds to make a decision on to which server to send each job, given theobjective of SLA penalty cost minimization. Assuming servers arehomogeneous, the dispatch decision depends on the scheduling policyemployed at the servers.

Next, cost-based dispatching is discussed. Some simple traditionaldispatching policies include random and round robin. While being simple,these policies do not perform well, especially given highly variable joblength, such as that in long tail distributions. Other moresophisticated policies include Join-shortest-queue (JSQ) orleast-work-left (LWL), where the former sends jobs to the server withthe fewest jobs in the queue and the latter sends jobs to the serverwhose the sum of job lengths in the queue is the least among allservers. In particular, LWL is locally optimal policy in that each jobwill choose the server that will minimize its waiting time, though itdoes not necessarily minimize the total response time of all jobs. Whenjob lengths are highly variable, as in heavy tail distributions, it hasbeen shown that SITA (Size Interval Task Assignment) often outperformsLWL. In SITA, jobs are dispatched according to its job length, such thatserver-0 will get the smallest jobs, server-1 will get the next longerjobs, and the last server will get the longest jobs. In choosing theboundaries, SITA-E (SITA with Equal load), the most popular type ofSITA, ensures that total work are equal across all servers. However, ithas been observed that SITA-E does not necessarily minimize the averageresponse time, and therefore SITA-U has been proposed, which unbalancesthe load to achieve optimal average response time.

SITA-E or SITA-U, however, may not be the best candidate for SLA-baseddispatching, since they are not aware of SLA penalty cost function anddo not necessarily minimize the total SLA penalty cost. For instance,SITA-E would send equal load to all servers, but it may be the case thatshort jobs are more expensive in terms of SLA penalty cost, and thesystem may want to send less load to short job servers than long-jobservers. Likewise, SITA-U may find its own optimal boundaries ofsplitting jobs according to length for response time minimization, butit may not be the best set of boundaries for cost minimization.

Finding the optimal boundaries for SITA-UC, unfortunately, is not aneasy problem. To solve the problem, a simulation-based technique can beused for SITA-UC boundary tuning In an exemplary case of two serverdispatching, the system needs to decide a single boundary that dividesjob size intervals into two. To do this, multiple boundaries can betested between the shortest and the longest job lengths, and theboundary that gives the lowest cost can be used. An approximateassumption that the boundary-value-to-SLA-cost function is near-unimodalcan be used. A function is unimodal if it has only one local minima,which is its global minima; and it is near-unimodal, if it has multiplelocal minimas, but they are all very close to the global minima.

Tuning of a single boundary is done in two phases. In the first phase,the lowerbound and upperbound of the global minima are located. Startingfrom the boundary previously found, the process makes exponential jumpsto the left, i.e. divide by 2 each time, to find the lowerbound. Whenf(0.5×)>f(x), then 0.5× is the lowerbound. Likewise, the system performsan upperbound search to the right using exponential jumps, i.e. multiplyby 2 each time, and when f(x)<f(2×), then 2× is the upperbound. Withthese two bounds, the system performs a narrowing down search in thesecond phase. The system divides the interval bounded by lowerboundx_(LB) and upperbound x_(UB) into three equal-length sections using twodivision points named x₁ and x₂. The system then evaluates f(x_(i)) andf(x₂) using two simulation runs. If f(x_(i))<f(x₂), the global minima iswithin [x_(LB),x₂] and the next round search is done where this intervalis divided into three sections. If f(x₁)>f(x₂), then the global minimalis in [x₁,x_(UB)] and the search is limited to this smaller interval inthe next round. This process is repeated until x_(LB)/x_(UB) is greaterthan a parameter StopPrecision, such as 0.9.

For more than two servers, a single cutoff can be used to divide shortjobs and long jobs, which is decided by the above SITA-UC cutoff searchprocess. The servers are divided into two groups, one for short jobs andanother for long jobs, and within the group LWL is used. The system canalso generalize a dispatching policy in the context where each job maybe served by only a subset of servers. In this case, capability groupsare set up where jobs and servers belong to one of them, and SITA is runfor each capability group.

Capacity planning is discussed next. The capacity planning processallocates resources in an intelligent way, considering factors such asjob traffic and profit model, so that the total profit is maximized. Thesystem uses observed job traffic patterns and unit server costs as thebasis for the immediate future planning Simply adding more servers willincrease the operational cost. Therefore, the task of capacity planningis to identify the best allocation that maximizes the total profit.Therefore, the task of capacity planning is to identify the bestallocation that maximizes the total profit.

Simulation-based capacity planning is used in one embodiment. The systemhas a discrete event simulator module that is responsible for findingoptimum resource allocations through planned simulations.

Capacity planning simulation can be online or offline. In OfflineSimulation for Capacity Planning, the simulation receives the given jobcharacteristics, the numbers of servers to handle the jobs, anddifferent operational costs. Simulation-based capacity planning relieson simulations to estimate, in an offline manner, the profits underdifferent server numbers in order to decide the best setting. The inputsto the simulation are the profit model and job characteristics. Thoseinputs are derived the real query logs for the already running systems.At the initialization stage of a system, certain data statistics, suchas the distribution of job inter-arrival time and that of job servicetime unless they are not provided by the client are assumed. Oneembodiment of the capacity planner uses most frequently useddistributions to initially characterize the data statistics. After that,it effectively refines those initial assumptions by constantlymonitoring the system. This feature allows the system not to heavilyrely on the client input on the data statistics to start with.

In offline simulations, data characteristics are assumed to be timeinvariant. Because such an assumption does not always hold true in cloudcomputing, the simulation results should be updated in real time. Iftime allows, the offline simulation can be repeated in real time.However, in many cases offline simulations are not acceptable eitherbecause it takes too much resource to run them or because it takes toomuch delay for them to give final answers. In other words, simulationsconducted in real time should be quick and take less resource. With atime budget, an online simulation can be done to estimate an approximateoptimal solution by using ICDC's online simulation capabilities. Themain ideas are (1) instead of multiple simulation runs at a serversetting, one run is done, and (2) instead of checking all the possibleserver numbers, the system checks a subset of server numbers. The costestimation is then computed from a polynomial regression, which handlesboth the variance due to the single run and the interpolation for theunchecked server settings.

FIG. 6 shows an exemplary process for prioritization and scheduling ofincoming jobs based on cost functions. First, a new job is received(210). Next, the process removes one step from the cost function (220).The process then determines priority values an create a segment (230) asillustrated in FIG. 2. The process checks if there are additional stepsin the cost functions (240) and if so, loops back to 220 to handle thestep and otherwise exits (250).

FIG. 7 shows an exemplary process to schedule jobs prioritized in FIG.6. The process pulls a segment from the head of the priority queue(300). The process checks if the validity of the segment has expired(310), and if not, the process checks if the job has been scheduled(320). From 310 or 320, if the segment is invalid or the job has beenscheduled, the process ignores the segment (330) and loops back to 300to handle the next segment. From 320, if the job has not been scheduled,the process marks the other segments of the job as scheduled (340), andschedules the job (350).

The result of the foregoing is a data management platform which ishosted on an Infrastructure-as-a-Service in the cloud, such as AmazonEC2. The system optimizes a database service provider's profit whiledelivering the services according to customer SLAs. The systemidentifies the major relevant components of cloud service deliveryarchitecture that need to be optimized to reach this goal.

The system explicitly considers SLA penalty cost function of each job atthe core of scheduling, dispatching, and capacity planning problems toachieve an overall cost optimal solution. The system provides acost-based and constraint-conscious resource scheduling method, calledincremental Cost-Based Scheduling (iCBS), for profit optimization. TheiCBS makes the cost-based scheduling a feasible option by a substantialefficiency improvement.

The system can be applied to other SLA-based resource and workloadmanagement in cloud databases, such as job dropping, preempt-and-restartscheduling, and MPL tuning for the purpose of SLA profit optimization.From the cloud user perspective, SLA design will be a more complicated,but interesting, in the presence of such SLA profit optimizingtechniques from the cloud service providers, e.g. how should a clientdesign his/her SLAs so that the provider will accept it, and still get acertain level of services reliably delivered given the competition withother users.

Also, the cloud provider or the client may want to define additionalconstraints on certain jobs in addition to SLA penalty costs. Forinstance, cloud-service providers may want to provide differentiatedquality of services to certain customers. The reasons could be various:e.g. i) service provider' desire for providing some guarantee againststarvation on all customers, such that no jobs will experience a delaygreater than a pre-set threshold, ii) explicit customer request inaddition to the SLA-based price agreement, and iii) service provider'sinternal planning among multiple service components of a provider. Suchconstraint enforcement along with SLA cost optimization is a valuablefeature. The method runs on a framework called constraint-consciousoptimization scheduling (CCOS) that can schedule jobs such that itenforces desired constraints with a marginal sacrifice on theoptimization metric (e.g. SLA penalty cost).

Cost-based dispatching is optimally handled. The system dispatches jobsamong multiple servers with a Size Interval Task Assignment (SITA)-baseddispatching policy, called SITA-UC, for the purpose of costminimization, and a SITA boundary tuning process is used.

The system provides an effective and robust capacity planning frameworkfor cloud resource management. The key elements of the framework are twofolds: i) the capacity planner does not need to assume any distributionfor user traffic and job lengths and ii) it works with cost-basedscheduler and cost-based dispatcher modules in a tightly integratedmanner to enable end-to-end profit optimization in the system. Thesystem has been tested through extensive testing. Real data of useraccess data at Yahoo video site, and TPC-H benchmarks are used in thetests.

The invention may be implemented in hardware, firmware or software, or acombination of the three. Preferably the invention is implemented in acomputer program executed on a programmable computer having a processor,a data storage system, volatile and non-volatile memory and/or storageelements, at least one input device and at least one output device.

By way of example, a computer with digital signal processing capabilityto support the system is discussed next. The computer preferablyincludes a processor, random access memory (RAM), a program memory(preferably a writable read-only memory (ROM) such as a flash ROM) andan input/output (I/O) controller coupled by a CPU bus. The computer mayoptionally include a hard drive controller which is coupled to a harddisk and CPU bus. Hard disk may be used for storing applicationprograms, such as the present invention, and data. Alternatively,application programs may be stored in RAM or ROM. I/O controller iscoupled by means of an I/O bus to an I/O interface. I/O interfacereceives and transmits data in analog or digital form over communicationlinks such as a serial link, local area network, wireless link, andparallel link. Optionally, a display, a keyboard and a pointing device(mouse) may also be connected to I/O bus. Alternatively, separateconnections (separate buses) may be used for I/O interface, display,keyboard and pointing device. Programmable processing system may bepreprogrammed or it may be programmed (and reprogrammed) by downloadinga program from another source (e.g., a floppy disk, CD-ROM, or anothercomputer).

Each computer program is tangibly stored in a machine-readable storagemedia or device (e.g., program memory or magnetic disk) readable by ageneral or special purpose programmable computer, for configuring andcontrolling operation of a computer when the storage media or device isread by the computer to perform the procedures described herein. Theinventive system may also be considered to be embodied in acomputer-readable storage medium, configured with a computer program,where the storage medium so configured causes a computer to operate in aspecific and predefined manner to perform the functions describedherein.

The invention has been described herein in considerable detail in orderto comply with the patent Statutes and to provide those skilled in theart with the information needed to apply the novel principles and toconstruct and use such specialized components as are required. However,it is to be understood that the invention can be carried out byspecifically different equipment and devices, and that variousmodifications, both as to the equipment details and operatingprocedures, can be accomplished without departing from the scope of theinvention itself.

1. A method for efficient maintenance of job prioritization for profit maximization in cloud-based service delivery infrastructures with multi-step cost structure support, comprising: breaking multiple steps in the SLA of a job into corresponding cost steps; generating a segmented cost function for each cost step; creating a cost-based-scheduling (CBS)-priority value associated with a validity period for each segment based on the segmented cost function; and choosing the job with the highest CBS priority value.
 2. The method of claim 1, comprising dividing each priority value by e^((t−t) ⁰ ^()/a) where is t is the current time, and t₀ is a fixed time instance.
 3. The method of claim 1, comprising inserting segmented-CBS-priority objects into a priority queue, where the objects are ordered by CBS priority, wherein the priority queue maintains a sorted list of jobs and performs job scheduling as a logarithmic-time operation by pulling the next job from a head of the priority queue.
 4. The method of claim 1, wherein among all segments corresponding to the same job, segment i has a higher CBS priority than segment j, where i<j.
 5. The method of claim 1, comprising chaining all segments of the same job together.
 6. The method of claim 1, wherein the chaining comprises adding a next segment pointer from a segmented-CBS-priority object i to i+1.
 7. The method of claim 1, comprising pulling a head of a segmented-CBS-priority queue, wherein the head has the highest priority value.
 8. The method of claim 1, comprising checking if a validity period for the job has expired or the job has been scheduled by an earlier segment.
 9. The method of claim 1, comprising discarding the segment and pulling the next head off the priority queue, until we find the segmented-CBS-priority with an unexpired validity period and also has not been scheduled yet.
 10. The method of claim 1, comprising marking other segments for the same job in the priority queue as scheduled.
 11. The method of claim 1, comprising monitoring a system status including system load, queue lengths, query response time, processor and input/output utilization.
 12. The method of claim 1, comprising dynamically adding or removing servers based on the system status to optimize operational cost while keeping the SLA costs in check.
 13. The method of claim 1, comprising performing capacity planning based on the system status.
 14. The method of claim 13, wherein the capacity planning is dynamic.
 15. The method of claim 1, comprising applying an Online Simulator for dynamic capacity planning through simulation.
 16. The method of claim 1, wherein the Online Simulator comprises offline and online modes.
 17. The method of claim 1, comprising deciding an order of execution of jobs at each server.
 18. The method of claim 1, comprising dispatching jobs to one or more servers based on a dispatching policy.
 19. The method of claim 1, wherein the dispatching policy is tuned according to dynamic changes including user traffic, addition or removal of processing nodes.
 20. The method of claim 19, wherein after client requests are dispatched to individual servers based on the dispatching policy, prioritizing dispatched jobs locally by forming a queue of queries from which a query is chosen and executed in a database. 